vignettes/exploring_subgraph_results.Rmd
exploring_subgraph_results.RmdFirst we will run subgraph finding algorithm across the associations
subgraphs <- association_pairs %>% head(10000) %>% calculate_subgraph_structure() subgraphs %>% select(-subgraphs) %>% head() %>% knitr::kable()
| step | n_edges | strength | n_nodes_seen | n_subgraphs | max_size | rel_max_size | avg_size | avg_density | n_triples |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0.0916161 | 2 | 1 | 2 | 1.0000000 | 2.0 | 1.0000000 | 0 |
| 2 | 2 | 0.0836374 | 3 | 1 | 3 | 1.0000000 | 3.0 | 0.6666667 | 1 |
| 3 | 3 | 0.0749584 | 5 | 2 | 3 | 0.6000000 | 2.5 | 0.8333333 | 1 |
| 4 | 4 | 0.0729044 | 6 | 2 | 3 | 0.5000000 | 3.0 | 0.6666667 | 2 |
| 5 | 5 | 0.0728354 | 6 | 2 | 3 | 0.5000000 | 3.0 | 0.8333333 | 2 |
| 6 | 6 | 0.0704478 | 7 | 2 | 4 | 0.5714286 | 3.5 | 0.6666667 | 2 |
Next we can investigate the subgraph structure over search
min_rel <- subgraphs %>% filter(rel_max_size == min(rel_max_size)) %>% tail(1) max_num_subgraphs <- subgraphs %>% filter(n_subgraphs == max(n_subgraphs)) %>% tail(1) max_num_triples <- subgraphs %>% filter(n_triples == max(n_triples)) %>% tail(1) subgraphs %>% # filter(rel_max_size < 0.5) %>% select( strength, n_subgraphs, max_size, rel_max_size, avg_density, n_triples, step ) %>% pivot_longer(-step) %>% ggplot(aes(x = step, y = value)) + geom_step() + geom_vline(xintercept = min_rel$step, color = 'orangered') + geom_vline(xintercept = max_num_subgraphs$step, color = 'forestgreen') + geom_vline(xintercept = max_num_triples$step, color = 'steelblue') + facet_grid(rows = vars(name), scales = "free_y")

node_info <- virus_host_viruses %>% rename(id = virus_id) %>% mutate(color = ifelse(type == "RNA", "orangered", "steelblue")) visualize_subgraph_structure( association_pairs, node_info = node_info, subgraph_results = subgraphs, trim_subgraph_results = TRUE )